Selector validation #162

raxbg · 2019-02-25T11:03:44Z

The way we currently detect extra closing block brackets } is not perfect and still causes issues, because we just ignore them. This however does not match the browsers' behavior. Depending on the case the extra closing brackets might end up as part of the selector. Take a look at the following:

@keyframes mymove {
  from { top: 0px; }
}

#test {
  color: white;
  background: green;
}

body
  background: black;
  }

#test {
  display: block;
  background: red;
  color: white;
}
#test {
  display: block;
  background: white;
  color: black;
}

The body selector is missing an opening { so the selector string will be everything up to the next { found at the first #test { declaration. This makes the selector invalid, so the first definition of #test must be skipped, but parsing must continue. The final tree must include the @keyframes, first #test definition and the last #test definition. The rules in between must be skipped.

This is why I think we should have a selector validation and skip the rule sets for invalid selectors (which matches the browsers' behavior).

The modifications in this PR will change the output of the library, so this change might be considered BC breaking, but I believe it is a move in the right direction. I tried to build the selector validation according to the definition found in this document. More specifically this line In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-) and the underscore (_);. It is not perfect yet, because it doesn't check for escaped characters, or a valid start of the selector. I might try to add these if we agree that this is a viable approach to the problem.

Let me know what you think.

…racters should be covered now)

…tead aborting the parsing process. This seems to match the browsers' behavior

…ch the browsers' parsing behavior

…electors

sabberworm · 2019-05-28T16:25:23Z

Thanks! I did start work on proper selector parsing some time ago but did not have time to finish it. So for now this will have to do. Does it cover cases like:

.this-selector [should="be-{"] .valid {
}

and

.this-selector /* should remain-} */ .valid {
}

?

Also see the ignored test case in https://github.com/sabberworm/PHP-CSS-Parser/blob/a90142fa0c664db18ea01948a61466bf1ff79220/tests/files/-tobedone.css#L1.

raxbg · 2019-05-28T16:45:29Z

Just tested how Chrome behaves in these cases and the first example is actually invalid. The second one, however, should be valid but it is currently not. I will try to fix this soon.

sabberworm · 2019-06-02T17:58:36Z

Just tested how Chrome behaves in these cases and the first example is actually invalid.

No it isn’t. See https://codepen.io/anon/pen/GazJYb

Works flawlessly on Firefox and Chrome.

…provement/selector_validation

raxbg · 2019-07-12T19:19:17Z

Just pushed an update which resolves the pending issues. Please take another look.

sabberworm

Thanks. I think this looks good but I do have some reservation about this huge regex that is too complicated. Could you maybe make that more readable?

sabberworm · 2019-07-13T09:53:53Z

lib/Sabberworm/CSS/Property/Selector.php

 	public function __construct($sSelector, $bCalculateSpecificity = false) {
+		if (!Selector::isValid($sSelector)) {
+			throw new UnexpectedTokenException("Selector did not match '" . self::SELECTOR_VALIDATION_RX . "'.", $sSelector, "custom");


I’m not sure it is the job of the Selector class to validate its input. I think it’s the job of the selector parsing logic (i.e. DeclarationBlock::parse) to do that.

sabberworm · 2019-07-13T09:59:03Z

lib/Sabberworm/CSS/Property/Selector.php

@@ -35,10 +37,21 @@ class Selector {
 	))
 	/ix';

+	const SELECTOR_VALIDATION_RX = '/
+	^((?:[a-zA-Z0-9\x{00A0}-\x{FFFF}_\^\$\|\*\=\"\'\~\[\]\(\)\-\s\.:#\+\>]*(?:\\\\.)?(?:\'.*?\')?(?:\".*?\")?)*|\s*?[\+-]?\d+\%\s*)$


This regex is too complicated to understand. If it can’t be simplified, maybe a regex isn’t the right tool for the job. But since the end goal for selectors still is a complete parser (which will make the regex obsolete), I’ll allow it for now. But maybe you could split the regex to multiple lines using concatenation and comment each line so we better understand what it does.

Also, some of the escapes are unnecessary. Inside character classes [], you only need to escape ] and - (in some cases) (and maybe [ to be symmetrical), but ^, $, |, *, =, ", ~, +, (, ) can be left literal (' was already literal since \' is a string escape, not a regex escape).

sabberworm

Thanks, looks good to me.

raxbg added 7 commits February 24, 2019 17:32

Add basic selector validation

2c5cd6f

Add | and * to the list of valid selector characters

218a207

Add ^ and $ to the list of valid selector characters (all special cha…

8792d00

…racters should be covered now)

Skip parsing of the next ruleset upon finding an invalid selector ins…

15f1fdb

…tead aborting the parsing process. This seems to match the browsers' behavior

Update closing bracket parsing in a way which allows us to better mat…

4f8da97

…ch the browsers' parsing behavior

Add unit test for invalid selectors

a06e5bb

Add percentage matching since keyframe steps are treated as regular s…

4ea4fd6

…electors

MyIntervals deleted a comment May 8, 2019

Improved selector parsing + better handling for '}'

134f4e6

MyIntervals deleted a comment May 8, 2019

More tests for invalid selectors

5b3f780

raxbg added 4 commits June 26, 2019 19:11

Improvement: Handle escaped characters when validating selectors

425c78e

Match selectors with comments or quoted strings

75f7b13

Merge branch 'master' of github.com:sabberworm/PHP-CSS-Parser into im…

385d559

…provement/selector_validation

Add forgotten test file

397ca39

sabberworm requested changes Jul 13, 2019

View reviewed changes

raxbg added 2 commits July 13, 2019 21:18

Move selector validation outside of the Selector's constructor

2943d9f

Simplify the CSS validating regex

a27e301

sabberworm approved these changes Jul 13, 2019

View reviewed changes

sabberworm merged commit 4ee910b into MyIntervals:master Jul 13, 2019

westonruter mentioned this pull request Apr 20, 2020

Some TailwindCSS class names are being removed ampproject/amp-wp#4609

Closed

This was referenced Dec 14, 2021

Failing tests with CSS Parser 8.4 MyIntervals/emogrifier#1129

Closed

[href=https://example.org] selector is discarded #347

Closed

crisp-tweakers mentioned this pull request Dec 24, 2021

@media block gives syntax error #352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Selector validation #162

Selector validation #162

Uh oh!

raxbg commented Feb 25, 2019 •

edited

Loading

Uh oh!

sabberworm commented May 28, 2019

Uh oh!

raxbg commented May 28, 2019

Uh oh!

sabberworm commented Jun 2, 2019 •

edited

Loading

Uh oh!

raxbg commented Jul 12, 2019

Uh oh!

sabberworm left a comment

Uh oh!

sabberworm Jul 13, 2019

Uh oh!

sabberworm Jul 13, 2019

Uh oh!

sabberworm left a comment

Uh oh!

Uh oh!

Selector validation #162

Selector validation #162

Uh oh!

Conversation

raxbg commented Feb 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sabberworm commented May 28, 2019

Uh oh!

raxbg commented May 28, 2019

Uh oh!

sabberworm commented Jun 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raxbg commented Jul 12, 2019

Uh oh!

sabberworm left a comment

Choose a reason for hiding this comment

Uh oh!

sabberworm Jul 13, 2019

Choose a reason for hiding this comment

Uh oh!

sabberworm Jul 13, 2019

Choose a reason for hiding this comment

Uh oh!

sabberworm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

raxbg commented Feb 25, 2019 •

edited

Loading

sabberworm commented Jun 2, 2019 •

edited

Loading